3.4 How to generate time-aggregated values - collapse

In addition to aggregating data into a higher unit level¹, e.g. from person level into family level (or municipality level), the command collapse may also be used as a tool for statistical measurements aggregated over a specified time span. In practice, this is the same as aggregating data from event/longitudinal level into person level. Examples may be calculations of a state duration measured over a given time interval, retrieval of status in a given time interval, retrieval of number of occurrences in given states in a given time interval, or summation of values over a given time interval.

This is done on event-organized data sets (see section 2.3.2) through the following command:

collapse (<aggregate measure>) <dataset>, by(<unit-id>)

Type of aggregation is required as input in the parenthesis following collapse, and then the name on an event organized dataset. Aggregation type may be as follows:

max maximum value
min minimum value
mean mean value
median median valuei
count number of values
sum sum of values
semean standard error of mean value
sebinomial binomial standard error of mean value
sd standard deviation
percent percentage valid values
iqr interquartile range (range between 75th and 25th percentiles)

The option by(<unit-id>) is used to specify which unit type to aggregate over. This will usually be individuals, given by the unit identification number contained by the key variabel PERSONID_1.

Example 1: Calculate the number of times the individuals have changed their marital status during 2000-2005

 require no.ssb.fdb:23 as db

create-dataset maritalevent
import-event db/SIVSTANDFDT_SIVSTAND 2000-01-01 to 2005-01-01 as
maritalperiod
collapse (count) maritalperiod, by(PERSONID_1)

rename maritalperiod maritalstates
replace maritalstates = maritalstates - 1

tabulate maritalstates

Example 2: Calculate the number of divorces per individual during 2000-2005

 require no.ssb.fdb:12 as db

create-dataset maritalevent
import-event db/SIVSTANDFDT_SIVSTAND 2000-01-01 to 2005-01-01 as
maritalperiod
keep if maritalperiod == '4'
collapse (count) maritalperiod, by(PERSONID_1)

rename maritalperiod divorces

tabulate divorces

Note that the variable maritalperiod initially contains data on marital status (each new record represents a change in marital status). However, through the steps in the examples, the variable is transformed from containing event level data into containing the count-value measured over the 2000-2005 period for the specific unit level (= individual). Thus, following the collapse-procedure, the variable maritalperiod will now contain the number of marital statuses measured per individual over the period (example 1) or the number of divorces per individual measured over the same period (= the number of records containing the value '4' which represents the status "divorced") (example 2).

NB! In order to be able to continue working with the aggregated value generated through collapse, the dataset needs to be linked with the other variables placed in the main analysis dataset built through the import-procedure (see section 2.3.1). See section 2.8 on how to do this.

See section 2.8 ↩

Footnotes​

Footnotes